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Abstract 

An extension to classical unification, called graded unifica- 
tion is presented. It is capable of combining contradictory 
information. An interactive processing paradigm and parser 
based on this new operator are also presented. 
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Introduction 

Improved understanding of the nature of knowledge 
used in human language processing suggests the fea- 
sibility of interactive models in computational linguis- 
tics (CL). Recent psycholinguistic work such as (Stowe, 
1989; Trueswell ei al., 1994) has documented rapid em- 
ployment of semantic information to guide human syn- 
tactic processing. In addition, corpus-based stochas- 
tic modelling of lexical patterns (see Weischedel et al., 
1993) may provide information about word sense fre- 
quency of the kind advocated since (Ford et al., 1982). 
Incremental employment of such knowledge to resolve 
syntactic ambiguity is a natural step towards improved 
cognitive accuracy and efficiency in CL models. 

This exercise will, however, pose difficulties for the 
classical ('hard') constraint-based paradigm. As illus- 
trated by the Trueswell et al. (1994) results, this view 
of constraints is too rigid to handle the kinds of effects 
at hand. These experiments used pairs of locally am- 
biguous reduced relative clauses such as: 

1) the man recognized by the spy took off down the street 

2) the van recognized by the spy took off down the street 

The verb recognized is ambiguously either a past par- 
ticipial form or a past tense form. Eye tracking showed 
that subjects resolved the ambiguity rapidly (before 
reading the 6?/-phrase) in 2) but not in 1) ^. The con- 
clusion they draw is that subjects use knowledge about 
thematic roles to guide syntactic decisions. Since van, 
which is inanimate, makes a good Theme but a poor 
Agent for recognized, the past participial analysis in 
2) is reinforced and the main clause (past tense) sup- 
pressed. Being animate, man performs either thematic 
role well, allowing the main clause reading to remain 
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^In fact, ambiguity effects were often completely elimi- 
nated in examples like 2), with reading times matching those 
for the unambiguous case: 

■3) the man/van that was recognized by the spy ... 



plausible until the disambiguating by-phiase is encoun- 
tered. At this point, readers of displayed confusion. 

Semantic constraints do appear to be at work here. 
However, the effects observed by Trueswell et al. are 
graded. Verb-complement combinations occupy a con- 
tinuous spectrum of "thematic fit" , which influences 
reading times. This likely stems from the variance of 
verbs with respect to the thematic roles they allow (e.g.. 
Agent, Instrument, Patient, etc.) and the syntactic po- 
sitions of these. 

The upshot of such observations is that classical uni- 
flcation (see Shieber, 1986), which has served well as the 
combinatory mechanism in classical constraint-based 
parsers, is too brittle to withstand this onslaught of 
uncertainty. 

This paper presents an extension to classical unifl- 
cation, called graded unification. Graded uniflcation 
combines two feature structures, and returns a strength 
which reflects the compatibility of the information en- 
coded by the two structures. Thus, two structures 
which could not unify via classical uniflcation may unify 
via graded uniflcation, and all combinatory decisions 
made during processing are endowed with a level of 
goodness. The operator is similar in spirit to the op- 
erators of fuzzy logic (see Kapcprzyk, 1992), which at- 
tempts to provide a calculus for reasoning in uncertain 
domains. Another related approach is the "Uniflcation 
Space" model of Kempen & Vosse (1989), which unifles 
through a process of simulated annealing, and also uses 
a notion of uniflcation strength. 

A parser has been implemented which combines con- 
stituents via graded uniflcation and whose decisions are 
influenced by uniflcation strengths. The result is a 
paradigm of incremental processing, which maintains 
a feature-based system of knowledge representation. 

System Description 

Though the employment of graded uniflcation engen- 
ders a new processing style, the system's architecture 
parallels that of a conventional uniflcation-based parser. 

Feature Structures: Prioritized Features 

The feature structures which encode the grammar in 
this system are conventional feature structures aug- 
mented by the association of priorities with each 
atomic- valued feature. Prioritizing features allows 
them to vary in terms of influence over the strength of 
uniflcation. The priority of an atomic- valued feature fi 
in a feature structure X will be denoted by Pri( fi, X). 
The effect of feature prioritization is clarifled in the fol- 
lowing sections. 



Graded Unification 

Given two feature structures, the graded unification 
mechanism (Ug) computes two results, a unifying struc- 
ture and a unification strength. 

Structural Unification Graded unification builds 
structure exactly as classical unification except in the 
case of atomic unification, where it deviates crucially. 

Atoms in this framework are weighted disjunctive val- 
ues. The weight associated with a disjunct is viewed as 
the confidence with which the processor believes that 
disjunct to be the 'correct' value. Figures 1(a) and 1(b) 
depict atoms (where 1(a) is "truly atomic" because it 
contains only one disjunct). 
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Figure 1: Examples of Atoms 

Atomic unification creates a mixture of its two ar- 
gument atoms as follows. When two atoms are unified, 
the set union of their disjuncts is collected in the result. 
For each disjunct in the result, the associated weight be- 
comes the average of the weights associated with that 
disjunct in the two argument atoms. Figure 1(c) shows 
an example unification of two atoms. The result is an 
atom which is 'believed' to be SG (singular), but could 
possibly be PL (plural). 

Unification Strength The unification strength (de- 
noted UaStrength) is a weighted average of atomic uni- 
fication strengths, defined in terms of two sums, the 
actual compatthtUty and the perfect compatibility. 

If A and B are non-atomic feature structures to be 
unified, then the following holds: 



UaStrengthiA, B) 
The actual compatibility is the sum 



ActualC ompatibility(A,B) 
Per f ectC ompatibility(A,B) ' 



PrH.u,A)+Pr.(.U,B) ^ UaStrength(v,A , v,b) 

if fi shared by A and B 
Pri( fi, A) if fi occurs only in A 

Pri( fi, B) if fi occurs only in B 



where i indexes all atomic- valued features in A or 5, 
and ViA and ViB are the values of fi in A and B respec- 
tively. The perfect compatibility is computed by a 
formula identical to this except that UaStrength is set 
to 1. 

If A and B are atomic, then UaStrengthiA, B) is 
the total weight of disjuncts shared by A and B: 
UaStrength(A, B) = "^i Min(wiA, Wis) where i in- 
dexes all disjuncts di shared by A and B, and WiA and 
WiB are the weights of di in A and B respectively. 

By taking atomic unification strengths into account, 
the actual compatibility provides a raw measure of the 
extent to which two feature structures agree. By ig- 
noring unification strengths (assuming a value of i.o), 
the perfect compatibility is an idealization of the actual 
compatibility; it is what the actual compatibility would 
be if the two structures were able to unify via classical 



unification. Thus, unification strength is always a value 
between o and i. 

The Parser: Activated Chart Edges 

The parser is a modified unification-based chart parser. 
Chart edges are assigned activation levels, which repre- 
sent the 'goodness' of (or confidence in) their associated 
analyses. Each new edge is activated according to the 
strength of the unification which licenses its creation 
and the activations of its constituent edges. 

Constraining Graded Unification Without some 
strict limit on its operation, graded unification will over- 
generate wildly. Two mechanisms exist to constrain 
graded unification. First, if a particular unification 
completes with strength below a specified unification 
threshold, it fails. Second, if a new edge is constructed 
with activation below a specified activation threshold, 
it is not allowed to enter the chart, and is suspended. 



Parsing Strategy The chart is initialized to contain 
one inactive edge for each lexical entry of each word 
in the input. Lexical edges are currently assigned an 
initial activation of i.o. 

The chart can then be expanded in two ways: 

1. An active edge may be extended by unifying its first 
unseen constituent with the LHS of an inactive edge. 

2. A new active edge may be created by unifying the 
LHS of a rule with the first unseen constituent of some 
active edge in the chart (top down rule invocation). 




Figure 2: Extension of an Active Edge by an Inactive Edge 

Figure 2 depicts the extension of the active EDGEl with 
the inactive EDGE2. The characters represent feature 
structures, and the ovular nodes on the right end of 
each edge represent activation level. The parser tries 
to unify C" , the mother node of EDGE2, with C, the 
first needed constituent of EDGEl. If this unification 
succeeds, the parser builds the extended edge, EDGE3 
(where C Ua C" produces C"). The activation of the 
new edge is a function of the strength of the unification 
and the current activations of EDGEl and EDGE2: 
activs = ■ UgSTRENGTH(C, C) 
-\- ■ activi 

+ ■ activ2 (The weights sum to 1.) 

EDGe3 enters the chart only if its activation exceeds 
the activation threshold. Rule invocation is depicted in 
figure 3. The first needed constituent in EDGEl is uni- 
fied with the LHS of RULEl. EDGe2 is created to begin 
searching for C . The new edge's activation is again a 
function of unification strength and other activations: 
activi = wi ■ UgSTRENGTH{C, C") 

-|- W2 ■ activi 

+ ■ activ2 
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Figure 3: Top Down Rule Invocation 



The activation levels of grammar rule edges, like those 
for lexical edges, are currently pegged to i.o. 

A Framework for Interactive Processing 

The system described above provides a flexible frame- 
work for the interactive use of non-syntactic knowledge. 

Animacy and Thematic Roles 

Knowledge about animacy and its important function 
in the filling of thematic roles can be modelled as a 
binary feature, ANIMATE. A (active voice) verb can 
strongly 'want' an animate Agent by specifying that its 
subject be [animate -|-] and assigning a high priority to 
the feature ANIMATE. Thus, any parse combining this 
verb with an inanimate subject will suffer in terms of 
unification strength. A noun can be strongly animate 
by having a high weight associated with the positive 
value of ANIMATE. Animacy has been encoded in a toy 
grammar. However, principled settings for the priority 
of this feature are left to future work. 

Statistical Information from Corpora 

Corpus-based part-of-speech (POS) statistics can also 
be naturally incorporated into the current model. It 
is proposed here that a Viterbi decoder could be used 
to generate the likelihoods of the n best POS tags 
for a given word in the input string. Lexical chart 
edges would then be initially activated to levels pro- 
portional to the predicted likelihoods of their associ- 
ated tags. Since these activations will be propagated 
to larger edges, parses involving predicted word senses 
would consequently be given a head start in a race of ac- 
tivations. Attractively, this strategy allows a fuller use 
of statistical information than one which uses the in- 
formation simply to deterministically choose the n best 
tags, which are then treated as equally likely. 

Interaction of Diverse Information 

A crucial feature of this framework is its potential for 
modelling the interaction between sources of informa- 
tion like the two above when they disagree. Sentences 
1) and 2) again provide illustration. In such sentences, 
knowledge about word sense frequency supports the 
wrong analysis, and semantic constraints must be em- 
ployed to achieve the correct (human) performance. 

Intuitively, the raw frequency (without considering 
context) of the past tense form of recognized is higher 
than that of the past participial. POS taggers, despite 
considering local context, consistently mis-tag the verb 
in reduced relatives. The absence of a disambiguating 
relativizer (e.g., thai) is one obvious source of difficulty 
here. But even the ostensibly disambiguating prepo- 
sition by, is itself ambiguous, since it might introduce 



a manner or locative phrase consistent with the main 
clause analysis. ^ 

Modelling human performance in such contexts 
requires allowing thematic information to compete 
against and defeat word frequency information. The 
current model allows such competition, as follows. POS 
information may incorrectly predict the main clause 
analysis, boosting the lexical edge associated with the 
past tense, and thereby boosting the main clause parse. 
However, the unification combining the past tense form 
of recognized with an inanimate subject (van) will be 
weak, due to the constraints encoded in the verb's lexi- 
cal entry. Since the activations of constituent edges de- 
pend on the strengths of the unifications used to build 
them, the main clause parse will lose activation. The 
parse combining the past participial with an inanimate 
subject (Theme) will suffer no losses, allowing it to over- 
take the incorrect parse. 

Conclusions and Future Work 

Assigning feature priorities and activation thresholds 
in this model will certainly be a considerable task. It 
is hoped that principled and automated methods can 
be found for assigning values to these variables. One 
promising idea is to glean information about patterns 
of subcategorization and thematic roles from annotated 
corpora. Annotation of such information has been sug- 
gested as a future direction for the Treebank project 
(Marcus ei al., 1993). It should be noted that learning 
such information will require more training data (hence 
larger corpora) than learning to tag part of speech. 
In addition, psycholinguistic studies such as the large 
norming study 3 of MacDonald and Pearlmutter (de- 
scribed in Trueswell et al., 1994) may prove useful in 
encoding thematic information in small lexicons. 
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^In fact, the utility of fey is neutralized in the case of POS 
tagging, since prepositions are uniformly tagged (e.g., using 
the tag IN in the Penn Treebank; see Marcus et al., 1993). 

•^These studies attempt to establish thematic patterns 
by asking large numbers of subjects to answer questions like 
"How typical is it for a van to be recognized by someone?" 
with a rating between 1 and 7. 



